Search CORE

13 research outputs found

Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding

Author: Darrell Trevor
Fukui Akira
Park Dong Huk
Rohrbach Anna
Rohrbach Marcus
Yang Daylen
Publication venue
Publication date: 01/01/2016
Field of study

Modeling textual or visual information with vector representations trained from large language or visual datasets has been successfully explored in recent years. However, tasks such as visual question answering require combining these vector representations with each other. Approaches to multimodal pooling include element-wise product or sum, as well as concatenation of the visual and textual representations. We hypothesize that these methods are not as expressive as an outer product of the visual and textual vectors. As the outer product is typically infeasible due to its high dimensionality, we instead propose utilizing Multimodal Compact Bilinear pooling (MCB) to efficiently and expressively combine multimodal features. We extensively evaluate MCB on the visual question answering and grounding tasks. We consistently show the benefit of MCB over ablations without MCB. For visual question answering, we present an architecture which uses MCB twice, once for predicting attention over spatial features and again to combine the attended representation with the question representation. This model outperforms the state-of-the-art on the Visual7W dataset and the VQA challenge.Comment: Accepted to EMNLP 201

arXiv.org e-Print Archive

Crossref

MPG.PuRe

Attentive Explanations: Justifying Decisions and Pointing to the Evidence (Extended Abstract)

Author: Akata Zeynep
Darrell Trevor
Hendricks Lisa Anne
Park Dong Huk
Rohrbach Anna
Rohrbach Marcus
Schiele Bernt
Publication venue
Publication date: 01/01/2017
Field of study

Deep models are the defacto standard in visual decision problems due to their impressive performance on a wide array of visual tasks. On the other hand, their opaqueness has led to a surge of interest in explainable systems. In this work, we emphasize the importance of model explanation in various forms such as visual pointing and textual justification. The lack of data with justification annotations is one of the bottlenecks of generating multimodal explanations. Thus, we propose two large-scale datasets with annotations that visually and textually justify a classification decision for various activities, i.e. ACT-X, and for question answering, i.e. VQA-X. We also introduce a multimodal methodology for generating visual and textual explanations simultaneously. We quantitatively show that training with the textual explanations not only yields better textual justification models, but also models that better localize the evidence that support their decision.Comment: arXiv admin note: text overlap with arXiv:1612.0475

arXiv.org e-Print Archive

MPG.PuRe

Multimodal Explanations: Justifying Decisions and Pointing to the Evidence

Author: Akata Zeynep
Darrell Trevor
Hendricks Lisa Anne
Park Dong Huk
Rohrbach Anna
Rohrbach Marcus
Schiele Bernt
Publication venue
Publication date: 01/01/2018
Field of study

Deep models that are both effective and explainable are desirable in many settings; prior explainable models have been unimodal, offering either image-based visualization of attention weights or text-based generation of post-hoc justifications. We propose a multimodal approach to explanation, and argue that the two modalities provide complementary explanatory strengths. We collect two new datasets to define and evaluate this task, and propose a novel model which can provide joint textual rationale generation and attention visualization. Our datasets define visual and textual justifications of a classification decision for activity recognition tasks (ACT-X) and for visual question answering tasks (VQA-X). We quantitatively show that training with the textual explanations not only yields better textual justification models, but also better localizes the evidence that supports the decision. We also qualitatively show cases where visual explanation is more insightful than textual explanation, and vice versa, supporting our thesis that multimodal explanation models offer significant benefits over unimodal approaches.Comment: arXiv admin note: text overlap with arXiv:1612.0475

arXiv.org e-Print Archive

Crossref

MPG.PuRe

International Migration, Integration and Social Cohesion online publications

Vision and Language Understanding Through Generative Modeling

Author: Park Dong Huk S
Publication venue
Publication date: 01/08/2023
Field of study

Ezid

Recommended from our members

Vision and Language Understanding Through Generative Modeling

Author: Park Dong Huk S
Publication venue: eScholarship, University of California
Publication date: 01/01/2023
Field of study

Language is such a powerful representation for capturing the knowledge and information about our world. It excels at expressing discrete concepts such as objects and their attributes, the relationships between them in a very compact manner all due to its extremely high level of abstraction. Language is the primary means by which we communicate, comprehend, and express our thoughts and ideas, and it lies at the very core of human intelligence. With the advent of powerful generative models, machines also have begun to comprehend and generate natural language with notable fluency and creativity. However, they lack “grounding”—a direct tie to the visual world. Vision plays a pivotal role in our comprehension and production of language. When we describe a scene, understand instructions, or engage in a dialogue, visual contextsignificantly aids our interpretation and generation of language. This highlights the need for integrating vision for generative modeling. Chapter 1 and 2 delve into image-to-text domain, spotlighting the importance of a multimodal approach for text generation. In Chapter 1, we explore how generating textual rationales with attention visualizations can enhance model transparency for visual question answering. In Chapter 2, we build generative models that abandon traditional left-to-right sequencing in favor of an unsupervised technique to determine optimal generation orders. Chapter 3 and 4 shift the focus to text-to-image generation. In Chapter 3, we introduce a training-free framework that combines linguistic cues with reference images, allowing for controllable image synthesis using denoising diffusion probabilistic models. Lastly, Chapter 4 emphasizes the importance of preserving object shapes in text-based image editing, proposing a unique mechanism that augments text-to-image models to be more faithful to input masks and text prompts

eScholarship - University of California

Vision and Language Understanding Through Generative Modeling

Author: Park Dong Huk S
Publication venue
Publication date: 01/08/2023
Field of study

Ezid

Statistical Analysis of Low-latitude Pi2 Pulsations Observed at Bohyun Station in Korea

Author: Chae-Woo Jun
Dong-Hun Lee
Ensang Lee
Hyuck-Jin Kwon
Junga Hwang
Khan-Huk Kim
Young-Deuk Park
Publication venue: 'The Korean Space Science Society'
Publication date: 01/03/2013
Field of study

We statistically investigated the properties of low-latitude Pi2 pulsations using Bohyun (BOH, Mlat = 29.8°, L = 1.35) ground magnetometer data in 2008. For this 1-year interval, 582 Pi2 events were identified when BOH was in the nightside from 1800 to 0600 local times. We found the following Pi2 characteristics. (1) The occurrence distribution of Pi2s is relatively constant in local times. (2) The Pi2 frequency varies in local times. That is, Pi2 pulsations in postmidnight sector had higher frequency than in premidnight sector. (3) Pi2 power in premidnight sector is stronger than in postmidnight sector. (4) Pi2 frequency has positive correlation with solar wind speed and AE index. (5) Pi2 power has not a clear correlation with solar wind parameters. This indicates that Pi2 power is not controlled by external sources. (6) It is found that the most probable-time between Pi2 onsets is Δt ~ 37.5 min: This is interpreted to be the period between Pi2 pulsations when they occur cyclically. We suggest that Δt ~ 37.5 min is the occurrence rate of reconnection of open field lines in the tail lobe

Directory of Open Access Journals

Flavones: An important scaffold for medicinal chemistry

Author: Adcock
Ai
Allan
Amoros
Anuradha
Aregawi
Ares
Ares
Arora
Auffret
Ayers
Babich
Babu
Babu
Babu
Baker
Barnes
Baures
Benkovic
Bensasson
Benson
Beretz
Betti
Beutler
Beutner
Bhagwat
Black
Blank
Bogert
Borges-Argáez
Bosi
Botta
Bozdag-Dündar
Briganti
Brown
Brownlee
Budriesi
Cai
Cao
Casano
Cesarone
Chang
Chang
Chao
Chapple
Chen
Chen
Chen
Chen
Chiarelli
Chimenti
Cho
Chung
Correia-da-Silva
Cotelle
Cronstein
Cushman
Cushnie
Dao
Davis
Dekermendjian
Dhooghe
Diaz-Padilla
Dimmock
Dinarello
Doble
Dong
Dong
Donnelly
Dormeyer
Du
Elion
Enes
Feghali
Ferlin
Ferlin
Fernández-Bachiller
Ferrell
Flemmig
Forghieri
Friedman
Fu
Fuhlendorff
Gao
Gao
Gao
García-Mediavilla
Gazak
Gerritsen
Ghani
Ghosh
Gillis
Gobbi
Gobbi
Goker
Gomes
Greeff
Grivennikov
Gulcin
Guo
Guz
Hahm
Halliwell
Halliwell
Han
Hansson
Harborne
Havsteen
Herrerias
Hertog
Hieble
Hirsh
Hou
Houimel
Huk
Hundal
Hyun
Ibrahim
Illek
Illek
Impey
Isobe
Iwase
Izzo
Jang
Jeong
Jung
Kahnberg
Kandaswami
Kang
Karton
Kashiwada
Khilya
Kim
Kim
Kim
Kim
Kimura
Knekt
Ko
Ko
Koes
Kubo
Kuroda
Kwon
La Casa
Lai
Lameira
Landolfi
Lapchak
Le Bail
Lebeau
Lebeau
Lee
Lee
Lewin
Li
Li
Lichius
Lin
Lin
Lin
Lin
Liu
Liu
Liu
Liu
Lu
Lu
MacDonald
Maeda
Mahal
Maninder Kaur
Manjinder Singh
Markesbery
Mastuda
Matin
Matin
Matsuda
Matsuda
Matsumoto
Mbwambo
McCord
McCord
Medina
Medina
Medina
Medzhitov
Mercader
Middleton
Middleton
Miki
Miller
Mishra
Mitchell
Mladenka
Moon
Mori
Morris
Mower
Muftuoglu
Mughal
Nagao
Nakatsuka
Nathan
Needleman
Nguyen
Nichols
Nicolas
Nijveldt
Nilsson
Ning
Nishikawa
Ntambi
Ohemeng
Ollila
Om Silakari
Osawa
Oshitari
Oshitari
Oteiza
Park
Parks
Peluso
Perez
Peterson
Pham
Phromnoi
Picciotto
Pietta
Ponce
Pouget
Rahimi
Raje
Rastelli
Re
Recanatini
Richard
Roden
Rodriguez
Roehrborn
Rowley
Ryu
Sagrera
Sankaranarayanan
Sarda
Sashidhara
Sashidhara
Sathiamoorthy
Scheller
Schemmel
Schonberg
Schuier
Senderowicz
Setnikar
Sharma
Sheng
Shi
Shimoi
Shin
Shin
Shukla
Sieghart
Simonis
Stavri
Stermitz
Stermitz
Su
Sumbul
Suzuki
Takahama
Takano-Ishikawa
Takasawa
Tamura
Tao
Tao
Tasdemir
Tatsuhiko
Teng
Testa
Thornalley
Torres-Piedra
Toussirot
Uma Devi
Uriarte-Pueyo
van Acker
van Noort
Veljkovic
Venkatesan
Verbeek
Verma
Wang
Wang
Wang
Wang
Weniger
Whiting
Wilcox
Williams
Wong
Wu
Xagorari
Xu
Yamamura
Yang
Yao
Yin
Yoon
Yu
Yuan
Zheng
Zheng
Zheng
Zibara
Zou
Zwaagstra
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref